{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Day 2 AM: Analyzing data with `dplyr`"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [],
"source": [
"suppressPackageStartupMessages(library(tidyverse))\n",
"suppressPackageStartupMessages(library(stringr))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Chaining data transformations with pipe (`%>%`)\n",
"\n",
"We will operate on data incrementally, step by step. At each step, we take a `data.frame`, apply a function to it, and generate a different `data.frame`. This `data.frame` itself can be modified by another function, leading to a chain of operations that all take a `data.frame` as input and return a `data.frame` as output. A convenient idiom (borrowed from the Unix shell) is to connect adjacent functions in the chain by a **pipe** which takes the output of a function and feeds it as input to the next function. The **pipe** operator in R is denoted by `%>%`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### A simple piping example\n",
"\n",
"Here we use piping to show rows 6-10 of the iris `data.frame`"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
\n",
"\n",
"\t5.1 | 3.5 | 1.4 | 0.2 | setosa |
\n",
"\t4.9 | 3.0 | 1.4 | 0.2 | setosa |
\n",
"\t4.7 | 3.2 | 1.3 | 0.2 | setosa |
\n",
"\t4.6 | 3.1 | 1.5 | 0.2 | setosa |
\n",
"\t5.0 | 3.6 | 1.4 | 0.2 | setosa |
\n",
"\t5.4 | 3.9 | 1.7 | 0.4 | setosa |
\n",
"\t4.6 | 3.4 | 1.4 | 0.3 | setosa |
\n",
"\t5.0 | 3.4 | 1.5 | 0.2 | setosa |
\n",
"\t4.4 | 2.9 | 1.4 | 0.2 | setosa |
\n",
"\t4.9 | 3.1 | 1.5 | 0.1 | setosa |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lllll}\n",
" Sepal.Length & Sepal.Width & Petal.Length & Petal.Width & Species\\\\\n",
"\\hline\n",
"\t 5.1 & 3.5 & 1.4 & 0.2 & setosa\\\\\n",
"\t 4.9 & 3.0 & 1.4 & 0.2 & setosa\\\\\n",
"\t 4.7 & 3.2 & 1.3 & 0.2 & setosa\\\\\n",
"\t 4.6 & 3.1 & 1.5 & 0.2 & setosa\\\\\n",
"\t 5.0 & 3.6 & 1.4 & 0.2 & setosa\\\\\n",
"\t 5.4 & 3.9 & 1.7 & 0.4 & setosa\\\\\n",
"\t 4.6 & 3.4 & 1.4 & 0.3 & setosa\\\\\n",
"\t 5.0 & 3.4 & 1.5 & 0.2 & setosa\\\\\n",
"\t 4.4 & 2.9 & 1.4 & 0.2 & setosa\\\\\n",
"\t 4.9 & 3.1 & 1.5 & 0.1 & setosa\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | \n",
"|---|---|---|---|---|---|---|---|---|---|\n",
"| 5.1 | 3.5 | 1.4 | 0.2 | setosa | \n",
"| 4.9 | 3.0 | 1.4 | 0.2 | setosa | \n",
"| 4.7 | 3.2 | 1.3 | 0.2 | setosa | \n",
"| 4.6 | 3.1 | 1.5 | 0.2 | setosa | \n",
"| 5.0 | 3.6 | 1.4 | 0.2 | setosa | \n",
"| 5.4 | 3.9 | 1.7 | 0.4 | setosa | \n",
"| 4.6 | 3.4 | 1.4 | 0.3 | setosa | \n",
"| 5.0 | 3.4 | 1.5 | 0.2 | setosa | \n",
"| 4.4 | 2.9 | 1.4 | 0.2 | setosa | \n",
"| 4.9 | 3.1 | 1.5 | 0.1 | setosa | \n",
"\n",
"\n"
],
"text/plain": [
" Sepal.Length Sepal.Width Petal.Length Petal.Width Species\n",
"1 5.1 3.5 1.4 0.2 setosa \n",
"2 4.9 3.0 1.4 0.2 setosa \n",
"3 4.7 3.2 1.3 0.2 setosa \n",
"4 4.6 3.1 1.5 0.2 setosa \n",
"5 5.0 3.6 1.4 0.2 setosa \n",
"6 5.4 3.9 1.7 0.4 setosa \n",
"7 4.6 3.4 1.4 0.3 setosa \n",
"8 5.0 3.4 1.5 0.2 setosa \n",
"9 4.4 2.9 1.4 0.2 setosa \n",
"10 4.9 3.1 1.5 0.1 setosa "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"head(iris, n=10)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
" | Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
\n",
"\n",
"\t6 | 5.4 | 3.9 | 1.7 | 0.4 | setosa |
\n",
"\t7 | 4.6 | 3.4 | 1.4 | 0.3 | setosa |
\n",
"\t8 | 5.0 | 3.4 | 1.5 | 0.2 | setosa |
\n",
"\t9 | 4.4 | 2.9 | 1.4 | 0.2 | setosa |
\n",
"\t10 | 4.9 | 3.1 | 1.5 | 0.1 | setosa |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lllll}\n",
" & Sepal.Length & Sepal.Width & Petal.Length & Petal.Width & Species\\\\\n",
"\\hline\n",
"\t6 & 5.4 & 3.9 & 1.7 & 0.4 & setosa\\\\\n",
"\t7 & 4.6 & 3.4 & 1.4 & 0.3 & setosa\\\\\n",
"\t8 & 5.0 & 3.4 & 1.5 & 0.2 & setosa\\\\\n",
"\t9 & 4.4 & 2.9 & 1.4 & 0.2 & setosa\\\\\n",
"\t10 & 4.9 & 3.1 & 1.5 & 0.1 & setosa\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"| | Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | \n",
"|---|---|---|---|---|\n",
"| 6 | 5.4 | 3.9 | 1.7 | 0.4 | setosa | \n",
"| 7 | 4.6 | 3.4 | 1.4 | 0.3 | setosa | \n",
"| 8 | 5.0 | 3.4 | 1.5 | 0.2 | setosa | \n",
"| 9 | 4.4 | 2.9 | 1.4 | 0.2 | setosa | \n",
"| 10 | 4.9 | 3.1 | 1.5 | 0.1 | setosa | \n",
"\n",
"\n"
],
"text/plain": [
" Sepal.Length Sepal.Width Petal.Length Petal.Width Species\n",
"6 5.4 3.9 1.7 0.4 setosa \n",
"7 4.6 3.4 1.4 0.3 setosa \n",
"8 5.0 3.4 1.5 0.2 setosa \n",
"9 4.4 2.9 1.4 0.2 setosa \n",
"10 4.9 3.1 1.5 0.1 setosa "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"iris %>% head(n=10) %>% tail(n=5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Filtering rows with `filter`"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
\n",
"\n",
"\t7.0 | 3.2 | 4.7 | 1.4 | versicolor |
\n",
"\t6.4 | 3.2 | 4.5 | 1.5 | versicolor |
\n",
"\t6.9 | 3.1 | 4.9 | 1.5 | versicolor |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lllll}\n",
" Sepal.Length & Sepal.Width & Petal.Length & Petal.Width & Species\\\\\n",
"\\hline\n",
"\t 7.0 & 3.2 & 4.7 & 1.4 & versicolor\\\\\n",
"\t 6.4 & 3.2 & 4.5 & 1.5 & versicolor\\\\\n",
"\t 6.9 & 3.1 & 4.9 & 1.5 & versicolor\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | \n",
"|---|---|---|\n",
"| 7.0 | 3.2 | 4.7 | 1.4 | versicolor | \n",
"| 6.4 | 3.2 | 4.5 | 1.5 | versicolor | \n",
"| 6.9 | 3.1 | 4.9 | 1.5 | versicolor | \n",
"\n",
"\n"
],
"text/plain": [
" Sepal.Length Sepal.Width Petal.Length Petal.Width Species \n",
"1 7.0 3.2 4.7 1.4 versicolor\n",
"2 6.4 3.2 4.5 1.5 versicolor\n",
"3 6.9 3.1 4.9 1.5 versicolor"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"iris %>% filter(Species == \"versicolor\") %>% head(3)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
\n",
"\n",
"\t7.0 | 3.2 | 4.7 | 1.4 | versicolor |
\n",
"\t6.4 | 3.2 | 4.5 | 1.5 | versicolor |
\n",
"\t6.9 | 3.1 | 4.9 | 1.5 | versicolor |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lllll}\n",
" Sepal.Length & Sepal.Width & Petal.Length & Petal.Width & Species\\\\\n",
"\\hline\n",
"\t 7.0 & 3.2 & 4.7 & 1.4 & versicolor\\\\\n",
"\t 6.4 & 3.2 & 4.5 & 1.5 & versicolor\\\\\n",
"\t 6.9 & 3.1 & 4.9 & 1.5 & versicolor\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | \n",
"|---|---|---|\n",
"| 7.0 | 3.2 | 4.7 | 1.4 | versicolor | \n",
"| 6.4 | 3.2 | 4.5 | 1.5 | versicolor | \n",
"| 6.9 | 3.1 | 4.9 | 1.5 | versicolor | \n",
"\n",
"\n"
],
"text/plain": [
" Sepal.Length Sepal.Width Petal.Length Petal.Width Species \n",
"1 7.0 3.2 4.7 1.4 versicolor\n",
"2 6.4 3.2 4.5 1.5 versicolor\n",
"3 6.9 3.1 4.9 1.5 versicolor"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"iris %>% filter(Sepal.Length > 6) %>% head(3)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
\n",
"\n",
"\t6.3 | 3.3 | 6.0 | 2.5 | virginica |
\n",
"\t7.1 | 3.0 | 5.9 | 2.1 | virginica |
\n",
"\t6.3 | 2.9 | 5.6 | 1.8 | virginica |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lllll}\n",
" Sepal.Length & Sepal.Width & Petal.Length & Petal.Width & Species\\\\\n",
"\\hline\n",
"\t 6.3 & 3.3 & 6.0 & 2.5 & virginica\\\\\n",
"\t 7.1 & 3.0 & 5.9 & 2.1 & virginica\\\\\n",
"\t 6.3 & 2.9 & 5.6 & 1.8 & virginica\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | \n",
"|---|---|---|\n",
"| 6.3 | 3.3 | 6.0 | 2.5 | virginica | \n",
"| 7.1 | 3.0 | 5.9 | 2.1 | virginica | \n",
"| 6.3 | 2.9 | 5.6 | 1.8 | virginica | \n",
"\n",
"\n"
],
"text/plain": [
" Sepal.Length Sepal.Width Petal.Length Petal.Width Species \n",
"1 6.3 3.3 6.0 2.5 virginica\n",
"2 7.1 3.0 5.9 2.1 virginica\n",
"3 6.3 2.9 5.6 1.8 virginica"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"iris %>% filter((Sepal.Length > 6) & (Species == \"virginica\")) %>% head(3)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
\n",
"\n",
"\t7.0 | 3.2 | 4.7 | 1.4 | versicolor |
\n",
"\t6.4 | 3.2 | 4.5 | 1.5 | versicolor |
\n",
"\t6.9 | 3.1 | 4.9 | 1.5 | versicolor |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lllll}\n",
" Sepal.Length & Sepal.Width & Petal.Length & Petal.Width & Species\\\\\n",
"\\hline\n",
"\t 7.0 & 3.2 & 4.7 & 1.4 & versicolor\\\\\n",
"\t 6.4 & 3.2 & 4.5 & 1.5 & versicolor\\\\\n",
"\t 6.9 & 3.1 & 4.9 & 1.5 & versicolor\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | \n",
"|---|---|---|\n",
"| 7.0 | 3.2 | 4.7 | 1.4 | versicolor | \n",
"| 6.4 | 3.2 | 4.5 | 1.5 | versicolor | \n",
"| 6.9 | 3.1 | 4.9 | 1.5 | versicolor | \n",
"\n",
"\n"
],
"text/plain": [
" Sepal.Length Sepal.Width Petal.Length Petal.Width Species \n",
"1 7.0 3.2 4.7 1.4 versicolor\n",
"2 6.4 3.2 4.5 1.5 versicolor\n",
"3 6.9 3.1 4.9 1.5 versicolor"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"iris %>% filter(Sepal.Length > mean(Sepal.Length)) %>% head(3)"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
\n",
"\n",
"\t6.3 | 3.3 | 6.0 | 2.5 | virginica |
\n",
"\t5.8 | 2.7 | 5.1 | 1.9 | virginica |
\n",
"\t7.1 | 3.0 | 5.9 | 2.1 | virginica |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lllll}\n",
" Sepal.Length & Sepal.Width & Petal.Length & Petal.Width & Species\\\\\n",
"\\hline\n",
"\t 6.3 & 3.3 & 6.0 & 2.5 & virginica\\\\\n",
"\t 5.8 & 2.7 & 5.1 & 1.9 & virginica\\\\\n",
"\t 7.1 & 3.0 & 5.9 & 2.1 & virginica\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | \n",
"|---|---|---|\n",
"| 6.3 | 3.3 | 6.0 | 2.5 | virginica | \n",
"| 5.8 | 2.7 | 5.1 | 1.9 | virginica | \n",
"| 7.1 | 3.0 | 5.9 | 2.1 | virginica | \n",
"\n",
"\n"
],
"text/plain": [
" Sepal.Length Sepal.Width Petal.Length Petal.Width Species \n",
"1 6.3 3.3 6.0 2.5 virginica\n",
"2 5.8 2.7 5.1 1.9 virginica\n",
"3 7.1 3.0 5.9 2.1 virginica"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"iris %>% filter(str_detect(Species, \"virgin\")) %>% head(3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Slicing rows by index\n",
"\n",
"We can do this via indexing, but using `slice` can be helpful for chaining of fluent commands."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
\n",
"\n",
"\t4.9 | 3.0 | 1.4 | 0.2 | setosa |
\n",
"\t4.7 | 3.2 | 1.3 | 0.2 | setosa |
\n",
"\t4.6 | 3.1 | 1.5 | 0.2 | setosa |
\n",
"\t5.4 | 3.9 | 1.7 | 0.4 | setosa |
\n",
"\t4.6 | 3.4 | 1.4 | 0.3 | setosa |
\n",
"\t5.0 | 3.4 | 1.5 | 0.2 | setosa |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lllll}\n",
" Sepal.Length & Sepal.Width & Petal.Length & Petal.Width & Species\\\\\n",
"\\hline\n",
"\t 4.9 & 3.0 & 1.4 & 0.2 & setosa\\\\\n",
"\t 4.7 & 3.2 & 1.3 & 0.2 & setosa\\\\\n",
"\t 4.6 & 3.1 & 1.5 & 0.2 & setosa\\\\\n",
"\t 5.4 & 3.9 & 1.7 & 0.4 & setosa\\\\\n",
"\t 4.6 & 3.4 & 1.4 & 0.3 & setosa\\\\\n",
"\t 5.0 & 3.4 & 1.5 & 0.2 & setosa\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | \n",
"|---|---|---|---|---|---|\n",
"| 4.9 | 3.0 | 1.4 | 0.2 | setosa | \n",
"| 4.7 | 3.2 | 1.3 | 0.2 | setosa | \n",
"| 4.6 | 3.1 | 1.5 | 0.2 | setosa | \n",
"| 5.4 | 3.9 | 1.7 | 0.4 | setosa | \n",
"| 4.6 | 3.4 | 1.4 | 0.3 | setosa | \n",
"| 5.0 | 3.4 | 1.5 | 0.2 | setosa | \n",
"\n",
"\n"
],
"text/plain": [
" Sepal.Length Sepal.Width Petal.Length Petal.Width Species\n",
"1 4.9 3.0 1.4 0.2 setosa \n",
"2 4.7 3.2 1.3 0.2 setosa \n",
"3 4.6 3.1 1.5 0.2 setosa \n",
"4 5.4 3.9 1.7 0.4 setosa \n",
"5 4.6 3.4 1.4 0.3 setosa \n",
"6 5.0 3.4 1.5 0.2 setosa "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"iris %>% slice(c(2:4, 6:8))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Selecting columns with `select`"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"Petal.Length | Petal.Width | Sepal.Length | Sepal.Width |
\n",
"\n",
"\t1.4 | 0.2 | 5.1 | 3.5 |
\n",
"\t1.4 | 0.2 | 4.9 | 3.0 |
\n",
"\t1.3 | 0.2 | 4.7 | 3.2 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|llll}\n",
" Petal.Length & Petal.Width & Sepal.Length & Sepal.Width\\\\\n",
"\\hline\n",
"\t 1.4 & 0.2 & 5.1 & 3.5\\\\\n",
"\t 1.4 & 0.2 & 4.9 & 3.0\\\\\n",
"\t 1.3 & 0.2 & 4.7 & 3.2\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"Petal.Length | Petal.Width | Sepal.Length | Sepal.Width | \n",
"|---|---|---|\n",
"| 1.4 | 0.2 | 5.1 | 3.5 | \n",
"| 1.4 | 0.2 | 4.9 | 3.0 | \n",
"| 1.3 | 0.2 | 4.7 | 3.2 | \n",
"\n",
"\n"
],
"text/plain": [
" Petal.Length Petal.Width Sepal.Length Sepal.Width\n",
"1 1.4 0.2 5.1 3.5 \n",
"2 1.4 0.2 4.9 3.0 \n",
"3 1.3 0.2 4.7 3.2 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"iris %>% select(c(Petal.Length, Petal.Width, Sepal.Length, Sepal.Width)) %>% head(3)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"Petal.Length | Petal.Width | Sepal.Length | Sepal.Width |
\n",
"\n",
"\t1.4 | 0.2 | 5.1 | 3.5 |
\n",
"\t1.4 | 0.2 | 4.9 | 3.0 |
\n",
"\t1.3 | 0.2 | 4.7 | 3.2 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|llll}\n",
" Petal.Length & Petal.Width & Sepal.Length & Sepal.Width\\\\\n",
"\\hline\n",
"\t 1.4 & 0.2 & 5.1 & 3.5\\\\\n",
"\t 1.4 & 0.2 & 4.9 & 3.0\\\\\n",
"\t 1.3 & 0.2 & 4.7 & 3.2\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"Petal.Length | Petal.Width | Sepal.Length | Sepal.Width | \n",
"|---|---|---|\n",
"| 1.4 | 0.2 | 5.1 | 3.5 | \n",
"| 1.4 | 0.2 | 4.9 | 3.0 | \n",
"| 1.3 | 0.2 | 4.7 | 3.2 | \n",
"\n",
"\n"
],
"text/plain": [
" Petal.Length Petal.Width Sepal.Length Sepal.Width\n",
"1 1.4 0.2 5.1 3.5 \n",
"2 1.4 0.2 4.9 3.0 \n",
"3 1.3 0.2 4.7 3.2 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"iris %>% select(c(3,4,1,2)) %>% head(3)"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"Sepal.Length | Sepal.Width | Petal.Length | Petal.Width |
\n",
"\n",
"\t5.1 | 3.5 | 1.4 | 0.2 |
\n",
"\t4.9 | 3.0 | 1.4 | 0.2 |
\n",
"\t4.7 | 3.2 | 1.3 | 0.2 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|llll}\n",
" Sepal.Length & Sepal.Width & Petal.Length & Petal.Width\\\\\n",
"\\hline\n",
"\t 5.1 & 3.5 & 1.4 & 0.2\\\\\n",
"\t 4.9 & 3.0 & 1.4 & 0.2\\\\\n",
"\t 4.7 & 3.2 & 1.3 & 0.2\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | \n",
"|---|---|---|\n",
"| 5.1 | 3.5 | 1.4 | 0.2 | \n",
"| 4.9 | 3.0 | 1.4 | 0.2 | \n",
"| 4.7 | 3.2 | 1.3 | 0.2 | \n",
"\n",
"\n"
],
"text/plain": [
" Sepal.Length Sepal.Width Petal.Length Petal.Width\n",
"1 5.1 3.5 1.4 0.2 \n",
"2 4.9 3.0 1.4 0.2 \n",
"3 4.7 3.2 1.3 0.2 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"iris %>% select(-Species) %>% head(3)"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"Sepal.Length | Petal.Length |
\n",
"\n",
"\t5.1 | 1.4 |
\n",
"\t4.9 | 1.4 |
\n",
"\t4.7 | 1.3 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|ll}\n",
" Sepal.Length & Petal.Length\\\\\n",
"\\hline\n",
"\t 5.1 & 1.4\\\\\n",
"\t 4.9 & 1.4\\\\\n",
"\t 4.7 & 1.3\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"Sepal.Length | Petal.Length | \n",
"|---|---|---|\n",
"| 5.1 | 1.4 | \n",
"| 4.9 | 1.4 | \n",
"| 4.7 | 1.3 | \n",
"\n",
"\n"
],
"text/plain": [
" Sepal.Length Petal.Length\n",
"1 5.1 1.4 \n",
"2 4.9 1.4 \n",
"3 4.7 1.3 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"iris %>% select(contains(\"Length\")) %>% head(3)"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"Sepal.Length | Sepal.Width | Species |
\n",
"\n",
"\t5.1 | 3.5 | setosa |
\n",
"\t4.9 | 3.0 | setosa |
\n",
"\t4.7 | 3.2 | setosa |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lll}\n",
" Sepal.Length & Sepal.Width & Species\\\\\n",
"\\hline\n",
"\t 5.1 & 3.5 & setosa\\\\\n",
"\t 4.9 & 3.0 & setosa\\\\\n",
"\t 4.7 & 3.2 & setosa\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"Sepal.Length | Sepal.Width | Species | \n",
"|---|---|---|\n",
"| 5.1 | 3.5 | setosa | \n",
"| 4.9 | 3.0 | setosa | \n",
"| 4.7 | 3.2 | setosa | \n",
"\n",
"\n"
],
"text/plain": [
" Sepal.Length Sepal.Width Species\n",
"1 5.1 3.5 setosa \n",
"2 4.9 3.0 setosa \n",
"3 4.7 3.2 setosa "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"iris %>% select(starts_with(\"S\")) %>% head(3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Renaming columns"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Type |
\n",
"\n",
"\t5.1 | 3.5 | 1.4 | 0.2 | setosa |
\n",
"\t4.9 | 3.0 | 1.4 | 0.2 | setosa |
\n",
"\t4.7 | 3.2 | 1.3 | 0.2 | setosa |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lllll}\n",
" Sepal.Length & Sepal.Width & Petal.Length & Petal.Width & Type\\\\\n",
"\\hline\n",
"\t 5.1 & 3.5 & 1.4 & 0.2 & setosa\\\\\n",
"\t 4.9 & 3.0 & 1.4 & 0.2 & setosa\\\\\n",
"\t 4.7 & 3.2 & 1.3 & 0.2 & setosa\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Type | \n",
"|---|---|---|\n",
"| 5.1 | 3.5 | 1.4 | 0.2 | setosa | \n",
"| 4.9 | 3.0 | 1.4 | 0.2 | setosa | \n",
"| 4.7 | 3.2 | 1.3 | 0.2 | setosa | \n",
"\n",
"\n"
],
"text/plain": [
" Sepal.Length Sepal.Width Petal.Length Petal.Width Type \n",
"1 5.1 3.5 1.4 0.2 setosa\n",
"2 4.9 3.0 1.4 0.2 setosa\n",
"3 4.7 3.2 1.3 0.2 setosa"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"iris %>% rename(Type = Species) %>% head(3)"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"SL | SW | PL | PW | Species |
\n",
"\n",
"\t5.1 | 3.5 | 1.4 | 0.2 | setosa |
\n",
"\t4.9 | 3.0 | 1.4 | 0.2 | setosa |
\n",
"\t4.7 | 3.2 | 1.3 | 0.2 | setosa |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lllll}\n",
" SL & SW & PL & PW & Species\\\\\n",
"\\hline\n",
"\t 5.1 & 3.5 & 1.4 & 0.2 & setosa\\\\\n",
"\t 4.9 & 3.0 & 1.4 & 0.2 & setosa\\\\\n",
"\t 4.7 & 3.2 & 1.3 & 0.2 & setosa\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"SL | SW | PL | PW | Species | \n",
"|---|---|---|\n",
"| 5.1 | 3.5 | 1.4 | 0.2 | setosa | \n",
"| 4.9 | 3.0 | 1.4 | 0.2 | setosa | \n",
"| 4.7 | 3.2 | 1.3 | 0.2 | setosa | \n",
"\n",
"\n"
],
"text/plain": [
" SL SW PL PW Species\n",
"1 5.1 3.5 1.4 0.2 setosa \n",
"2 4.9 3.0 1.4 0.2 setosa \n",
"3 4.7 3.2 1.3 0.2 setosa "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"iris %>% rename(SL=Sepal.Length, SW=Sepal.Width, PW=Petal.Width, PL=Petal.Length) %>% head(3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Sorting data with `arrange`"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
\n",
"\n",
"\t4.3 | 3.0 | 1.1 | 0.1 | setosa |
\n",
"\t4.4 | 2.9 | 1.4 | 0.2 | setosa |
\n",
"\t4.4 | 3.0 | 1.3 | 0.2 | setosa |
\n",
"\t4.4 | 3.2 | 1.3 | 0.2 | setosa |
\n",
"\t4.5 | 2.3 | 1.3 | 0.3 | setosa |
\n",
"\t4.6 | 3.1 | 1.5 | 0.2 | setosa |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lllll}\n",
" Sepal.Length & Sepal.Width & Petal.Length & Petal.Width & Species\\\\\n",
"\\hline\n",
"\t 4.3 & 3.0 & 1.1 & 0.1 & setosa\\\\\n",
"\t 4.4 & 2.9 & 1.4 & 0.2 & setosa\\\\\n",
"\t 4.4 & 3.0 & 1.3 & 0.2 & setosa\\\\\n",
"\t 4.4 & 3.2 & 1.3 & 0.2 & setosa\\\\\n",
"\t 4.5 & 2.3 & 1.3 & 0.3 & setosa\\\\\n",
"\t 4.6 & 3.1 & 1.5 & 0.2 & setosa\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | \n",
"|---|---|---|---|---|---|\n",
"| 4.3 | 3.0 | 1.1 | 0.1 | setosa | \n",
"| 4.4 | 2.9 | 1.4 | 0.2 | setosa | \n",
"| 4.4 | 3.0 | 1.3 | 0.2 | setosa | \n",
"| 4.4 | 3.2 | 1.3 | 0.2 | setosa | \n",
"| 4.5 | 2.3 | 1.3 | 0.3 | setosa | \n",
"| 4.6 | 3.1 | 1.5 | 0.2 | setosa | \n",
"\n",
"\n"
],
"text/plain": [
" Sepal.Length Sepal.Width Petal.Length Petal.Width Species\n",
"1 4.3 3.0 1.1 0.1 setosa \n",
"2 4.4 2.9 1.4 0.2 setosa \n",
"3 4.4 3.0 1.3 0.2 setosa \n",
"4 4.4 3.2 1.3 0.2 setosa \n",
"5 4.5 2.3 1.3 0.3 setosa \n",
"6 4.6 3.1 1.5 0.2 setosa "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"iris %>% arrange(Sepal.Length) %>% head"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
\n",
"\n",
"\t4.3 | 3.0 | 1.1 | 0.1 | setosa |
\n",
"\t4.4 | 3.2 | 1.3 | 0.2 | setosa |
\n",
"\t4.4 | 3.0 | 1.3 | 0.2 | setosa |
\n",
"\t4.4 | 2.9 | 1.4 | 0.2 | setosa |
\n",
"\t4.5 | 2.3 | 1.3 | 0.3 | setosa |
\n",
"\t4.6 | 3.6 | 1.0 | 0.2 | setosa |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lllll}\n",
" Sepal.Length & Sepal.Width & Petal.Length & Petal.Width & Species\\\\\n",
"\\hline\n",
"\t 4.3 & 3.0 & 1.1 & 0.1 & setosa\\\\\n",
"\t 4.4 & 3.2 & 1.3 & 0.2 & setosa\\\\\n",
"\t 4.4 & 3.0 & 1.3 & 0.2 & setosa\\\\\n",
"\t 4.4 & 2.9 & 1.4 & 0.2 & setosa\\\\\n",
"\t 4.5 & 2.3 & 1.3 & 0.3 & setosa\\\\\n",
"\t 4.6 & 3.6 & 1.0 & 0.2 & setosa\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | \n",
"|---|---|---|---|---|---|\n",
"| 4.3 | 3.0 | 1.1 | 0.1 | setosa | \n",
"| 4.4 | 3.2 | 1.3 | 0.2 | setosa | \n",
"| 4.4 | 3.0 | 1.3 | 0.2 | setosa | \n",
"| 4.4 | 2.9 | 1.4 | 0.2 | setosa | \n",
"| 4.5 | 2.3 | 1.3 | 0.3 | setosa | \n",
"| 4.6 | 3.6 | 1.0 | 0.2 | setosa | \n",
"\n",
"\n"
],
"text/plain": [
" Sepal.Length Sepal.Width Petal.Length Petal.Width Species\n",
"1 4.3 3.0 1.1 0.1 setosa \n",
"2 4.4 3.2 1.3 0.2 setosa \n",
"3 4.4 3.0 1.3 0.2 setosa \n",
"4 4.4 2.9 1.4 0.2 setosa \n",
"5 4.5 2.3 1.3 0.3 setosa \n",
"6 4.6 3.6 1.0 0.2 setosa "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"iris %>% arrange(Sepal.Length, desc(Sepal.Width)) %>% head"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Creating new columns with `mutate` and `transmute`"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | Comb.Length | Comb.Width |
\n",
"\n",
"\t5.1 | 3.5 | 1.4 | 0.2 | setosa | 6.5 | 3.7 |
\n",
"\t4.9 | 3.0 | 1.4 | 0.2 | setosa | 6.3 | 3.2 |
\n",
"\t4.7 | 3.2 | 1.3 | 0.2 | setosa | 6.0 | 3.4 |
\n",
"\t4.6 | 3.1 | 1.5 | 0.2 | setosa | 6.1 | 3.3 |
\n",
"\t5.0 | 3.6 | 1.4 | 0.2 | setosa | 6.4 | 3.8 |
\n",
"\t5.4 | 3.9 | 1.7 | 0.4 | setosa | 7.1 | 4.3 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lllllll}\n",
" Sepal.Length & Sepal.Width & Petal.Length & Petal.Width & Species & Comb.Length & Comb.Width\\\\\n",
"\\hline\n",
"\t 5.1 & 3.5 & 1.4 & 0.2 & setosa & 6.5 & 3.7 \\\\\n",
"\t 4.9 & 3.0 & 1.4 & 0.2 & setosa & 6.3 & 3.2 \\\\\n",
"\t 4.7 & 3.2 & 1.3 & 0.2 & setosa & 6.0 & 3.4 \\\\\n",
"\t 4.6 & 3.1 & 1.5 & 0.2 & setosa & 6.1 & 3.3 \\\\\n",
"\t 5.0 & 3.6 & 1.4 & 0.2 & setosa & 6.4 & 3.8 \\\\\n",
"\t 5.4 & 3.9 & 1.7 & 0.4 & setosa & 7.1 & 4.3 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | Comb.Length | Comb.Width | \n",
"|---|---|---|---|---|---|\n",
"| 5.1 | 3.5 | 1.4 | 0.2 | setosa | 6.5 | 3.7 | \n",
"| 4.9 | 3.0 | 1.4 | 0.2 | setosa | 6.3 | 3.2 | \n",
"| 4.7 | 3.2 | 1.3 | 0.2 | setosa | 6.0 | 3.4 | \n",
"| 4.6 | 3.1 | 1.5 | 0.2 | setosa | 6.1 | 3.3 | \n",
"| 5.0 | 3.6 | 1.4 | 0.2 | setosa | 6.4 | 3.8 | \n",
"| 5.4 | 3.9 | 1.7 | 0.4 | setosa | 7.1 | 4.3 | \n",
"\n",
"\n"
],
"text/plain": [
" Sepal.Length Sepal.Width Petal.Length Petal.Width Species Comb.Length\n",
"1 5.1 3.5 1.4 0.2 setosa 6.5 \n",
"2 4.9 3.0 1.4 0.2 setosa 6.3 \n",
"3 4.7 3.2 1.3 0.2 setosa 6.0 \n",
"4 4.6 3.1 1.5 0.2 setosa 6.1 \n",
"5 5.0 3.6 1.4 0.2 setosa 6.4 \n",
"6 5.4 3.9 1.7 0.4 setosa 7.1 \n",
" Comb.Width\n",
"1 3.7 \n",
"2 3.2 \n",
"3 3.4 \n",
"4 3.3 \n",
"5 3.8 \n",
"6 4.3 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"iris %>% mutate(Comb.Length=Sepal.Length + Petal.Length, \n",
" Comb.Width = Sepal.Width + Petal.Width) %>% head"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Mutate only columns where condition is TRUE"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
\n",
"\n",
"\t1.629241 | 1.252763 | 0.3364722 | -1.6094379 | setosa |
\n",
"\t1.589235 | 1.098612 | 0.3364722 | -1.6094379 | setosa |
\n",
"\t1.547563 | 1.163151 | 0.2623643 | -1.6094379 | setosa |
\n",
"\t1.526056 | 1.131402 | 0.4054651 | -1.6094379 | setosa |
\n",
"\t1.609438 | 1.280934 | 0.3364722 | -1.6094379 | setosa |
\n",
"\t1.686399 | 1.360977 | 0.5306283 | -0.9162907 | setosa |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lllll}\n",
" Sepal.Length & Sepal.Width & Petal.Length & Petal.Width & Species\\\\\n",
"\\hline\n",
"\t 1.629241 & 1.252763 & 0.3364722 & -1.6094379 & setosa \\\\\n",
"\t 1.589235 & 1.098612 & 0.3364722 & -1.6094379 & setosa \\\\\n",
"\t 1.547563 & 1.163151 & 0.2623643 & -1.6094379 & setosa \\\\\n",
"\t 1.526056 & 1.131402 & 0.4054651 & -1.6094379 & setosa \\\\\n",
"\t 1.609438 & 1.280934 & 0.3364722 & -1.6094379 & setosa \\\\\n",
"\t 1.686399 & 1.360977 & 0.5306283 & -0.9162907 & setosa \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | \n",
"|---|---|---|---|---|---|\n",
"| 1.629241 | 1.252763 | 0.3364722 | -1.6094379 | setosa | \n",
"| 1.589235 | 1.098612 | 0.3364722 | -1.6094379 | setosa | \n",
"| 1.547563 | 1.163151 | 0.2623643 | -1.6094379 | setosa | \n",
"| 1.526056 | 1.131402 | 0.4054651 | -1.6094379 | setosa | \n",
"| 1.609438 | 1.280934 | 0.3364722 | -1.6094379 | setosa | \n",
"| 1.686399 | 1.360977 | 0.5306283 | -0.9162907 | setosa | \n",
"\n",
"\n"
],
"text/plain": [
" Sepal.Length Sepal.Width Petal.Length Petal.Width Species\n",
"1 1.629241 1.252763 0.3364722 -1.6094379 setosa \n",
"2 1.589235 1.098612 0.3364722 -1.6094379 setosa \n",
"3 1.547563 1.163151 0.2623643 -1.6094379 setosa \n",
"4 1.526056 1.131402 0.4054651 -1.6094379 setosa \n",
"5 1.609438 1.280934 0.3364722 -1.6094379 setosa \n",
"6 1.686399 1.360977 0.5306283 -0.9162907 setosa "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"iris %>% mutate_if(is.numeric, log) %>% head"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Mutate columns that meet string criteria"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
\n",
"\n",
"\t1.629241 | 3.5 | 0.3364722 | 0.2 | setosa |
\n",
"\t1.589235 | 3.0 | 0.3364722 | 0.2 | setosa |
\n",
"\t1.547563 | 3.2 | 0.2623643 | 0.2 | setosa |
\n",
"\t1.526056 | 3.1 | 0.4054651 | 0.2 | setosa |
\n",
"\t1.609438 | 3.6 | 0.3364722 | 0.2 | setosa |
\n",
"\t1.686399 | 3.9 | 0.5306283 | 0.4 | setosa |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lllll}\n",
" Sepal.Length & Sepal.Width & Petal.Length & Petal.Width & Species\\\\\n",
"\\hline\n",
"\t 1.629241 & 3.5 & 0.3364722 & 0.2 & setosa \\\\\n",
"\t 1.589235 & 3.0 & 0.3364722 & 0.2 & setosa \\\\\n",
"\t 1.547563 & 3.2 & 0.2623643 & 0.2 & setosa \\\\\n",
"\t 1.526056 & 3.1 & 0.4054651 & 0.2 & setosa \\\\\n",
"\t 1.609438 & 3.6 & 0.3364722 & 0.2 & setosa \\\\\n",
"\t 1.686399 & 3.9 & 0.5306283 & 0.4 & setosa \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | \n",
"|---|---|---|---|---|---|\n",
"| 1.629241 | 3.5 | 0.3364722 | 0.2 | setosa | \n",
"| 1.589235 | 3.0 | 0.3364722 | 0.2 | setosa | \n",
"| 1.547563 | 3.2 | 0.2623643 | 0.2 | setosa | \n",
"| 1.526056 | 3.1 | 0.4054651 | 0.2 | setosa | \n",
"| 1.609438 | 3.6 | 0.3364722 | 0.2 | setosa | \n",
"| 1.686399 | 3.9 | 0.5306283 | 0.4 | setosa | \n",
"\n",
"\n"
],
"text/plain": [
" Sepal.Length Sepal.Width Petal.Length Petal.Width Species\n",
"1 1.629241 3.5 0.3364722 0.2 setosa \n",
"2 1.589235 3.0 0.3364722 0.2 setosa \n",
"3 1.547563 3.2 0.2623643 0.2 setosa \n",
"4 1.526056 3.1 0.4054651 0.2 setosa \n",
"5 1.609438 3.6 0.3364722 0.2 setosa \n",
"6 1.686399 3.9 0.5306283 0.4 setosa "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"iris %>% mutate_at(c(\"Sepal.Length\", \"Petal.Length\"), log) %>% head"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Only keep mutated columns"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"Comb.Length | Comb.Width |
\n",
"\n",
"\t6.5 | 3.7 |
\n",
"\t6.3 | 3.2 |
\n",
"\t6.0 | 3.4 |
\n",
"\t6.1 | 3.3 |
\n",
"\t6.4 | 3.8 |
\n",
"\t7.1 | 4.3 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|ll}\n",
" Comb.Length & Comb.Width\\\\\n",
"\\hline\n",
"\t 6.5 & 3.7\\\\\n",
"\t 6.3 & 3.2\\\\\n",
"\t 6.0 & 3.4\\\\\n",
"\t 6.1 & 3.3\\\\\n",
"\t 6.4 & 3.8\\\\\n",
"\t 7.1 & 4.3\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"Comb.Length | Comb.Width | \n",
"|---|---|---|---|---|---|\n",
"| 6.5 | 3.7 | \n",
"| 6.3 | 3.2 | \n",
"| 6.0 | 3.4 | \n",
"| 6.1 | 3.3 | \n",
"| 6.4 | 3.8 | \n",
"| 7.1 | 4.3 | \n",
"\n",
"\n"
],
"text/plain": [
" Comb.Length Comb.Width\n",
"1 6.5 3.7 \n",
"2 6.3 3.2 \n",
"3 6.0 3.4 \n",
"4 6.1 3.3 \n",
"5 6.4 3.8 \n",
"6 7.1 4.3 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"iris %>% transmute(Comb.Length=Sepal.Length + Petal.Length,\n",
" Comb.Width = Sepal.Width + Petal.Width) %>% head"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"####Multiple transformations"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"Sepal.Length_log | Sepal.Width_log | Petal.Length_log | Petal.Width_log | Sepal.Length_sqrt | Sepal.Width_sqrt | Petal.Length_sqrt | Petal.Width_sqrt |
\n",
"\n",
"\t1.629241 | 1.252763 | 0.3364722 | -1.6094379 | 2.258318 | 1.870829 | 1.183216 | 0.4472136 |
\n",
"\t1.589235 | 1.098612 | 0.3364722 | -1.6094379 | 2.213594 | 1.732051 | 1.183216 | 0.4472136 |
\n",
"\t1.547563 | 1.163151 | 0.2623643 | -1.6094379 | 2.167948 | 1.788854 | 1.140175 | 0.4472136 |
\n",
"\t1.526056 | 1.131402 | 0.4054651 | -1.6094379 | 2.144761 | 1.760682 | 1.224745 | 0.4472136 |
\n",
"\t1.609438 | 1.280934 | 0.3364722 | -1.6094379 | 2.236068 | 1.897367 | 1.183216 | 0.4472136 |
\n",
"\t1.686399 | 1.360977 | 0.5306283 | -0.9162907 | 2.323790 | 1.974842 | 1.303840 | 0.6324555 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|llllllll}\n",
" Sepal.Length\\_log & Sepal.Width\\_log & Petal.Length\\_log & Petal.Width\\_log & Sepal.Length\\_sqrt & Sepal.Width\\_sqrt & Petal.Length\\_sqrt & Petal.Width\\_sqrt\\\\\n",
"\\hline\n",
"\t 1.629241 & 1.252763 & 0.3364722 & -1.6094379 & 2.258318 & 1.870829 & 1.183216 & 0.4472136 \\\\\n",
"\t 1.589235 & 1.098612 & 0.3364722 & -1.6094379 & 2.213594 & 1.732051 & 1.183216 & 0.4472136 \\\\\n",
"\t 1.547563 & 1.163151 & 0.2623643 & -1.6094379 & 2.167948 & 1.788854 & 1.140175 & 0.4472136 \\\\\n",
"\t 1.526056 & 1.131402 & 0.4054651 & -1.6094379 & 2.144761 & 1.760682 & 1.224745 & 0.4472136 \\\\\n",
"\t 1.609438 & 1.280934 & 0.3364722 & -1.6094379 & 2.236068 & 1.897367 & 1.183216 & 0.4472136 \\\\\n",
"\t 1.686399 & 1.360977 & 0.5306283 & -0.9162907 & 2.323790 & 1.974842 & 1.303840 & 0.6324555 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"Sepal.Length_log | Sepal.Width_log | Petal.Length_log | Petal.Width_log | Sepal.Length_sqrt | Sepal.Width_sqrt | Petal.Length_sqrt | Petal.Width_sqrt | \n",
"|---|---|---|---|---|---|\n",
"| 1.629241 | 1.252763 | 0.3364722 | -1.6094379 | 2.258318 | 1.870829 | 1.183216 | 0.4472136 | \n",
"| 1.589235 | 1.098612 | 0.3364722 | -1.6094379 | 2.213594 | 1.732051 | 1.183216 | 0.4472136 | \n",
"| 1.547563 | 1.163151 | 0.2623643 | -1.6094379 | 2.167948 | 1.788854 | 1.140175 | 0.4472136 | \n",
"| 1.526056 | 1.131402 | 0.4054651 | -1.6094379 | 2.144761 | 1.760682 | 1.224745 | 0.4472136 | \n",
"| 1.609438 | 1.280934 | 0.3364722 | -1.6094379 | 2.236068 | 1.897367 | 1.183216 | 0.4472136 | \n",
"| 1.686399 | 1.360977 | 0.5306283 | -0.9162907 | 2.323790 | 1.974842 | 1.303840 | 0.6324555 | \n",
"\n",
"\n"
],
"text/plain": [
" Sepal.Length_log Sepal.Width_log Petal.Length_log Petal.Width_log\n",
"1 1.629241 1.252763 0.3364722 -1.6094379 \n",
"2 1.589235 1.098612 0.3364722 -1.6094379 \n",
"3 1.547563 1.163151 0.2623643 -1.6094379 \n",
"4 1.526056 1.131402 0.4054651 -1.6094379 \n",
"5 1.609438 1.280934 0.3364722 -1.6094379 \n",
"6 1.686399 1.360977 0.5306283 -0.9162907 \n",
" Sepal.Length_sqrt Sepal.Width_sqrt Petal.Length_sqrt Petal.Width_sqrt\n",
"1 2.258318 1.870829 1.183216 0.4472136 \n",
"2 2.213594 1.732051 1.183216 0.4472136 \n",
"3 2.167948 1.788854 1.140175 0.4472136 \n",
"4 2.144761 1.760682 1.224745 0.4472136 \n",
"5 2.236068 1.897367 1.183216 0.4472136 \n",
"6 2.323790 1.974842 1.303840 0.6324555 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"iris %>% transmute_if(is.numeric, funs(log, sqrt)) %>% head"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Split-apply-combine with `group_by` and `summarize`"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"mean |
\n",
"\n",
"\t5.843333 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|l}\n",
" mean\\\\\n",
"\\hline\n",
"\t 5.843333\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"mean | \n",
"|---|\n",
"| 5.843333 | \n",
"\n",
"\n"
],
"text/plain": [
" mean \n",
"1 5.843333"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"iris %>% summarise(mean = mean(Sepal.Length)) %>% head"
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"Sepal.Length | Sepal.Width | Petal.Length | Petal.Width |
\n",
"\n",
"\t876.5 | 458.6 | 563.7 | 179.9 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|llll}\n",
" Sepal.Length & Sepal.Width & Petal.Length & Petal.Width\\\\\n",
"\\hline\n",
"\t 876.5 & 458.6 & 563.7 & 179.9\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | \n",
"|---|\n",
"| 876.5 | 458.6 | 563.7 | 179.9 | \n",
"\n",
"\n"
],
"text/plain": [
" Sepal.Length Sepal.Width Petal.Length Petal.Width\n",
"1 876.5 458.6 563.7 179.9 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"iris %>% summarise_if(is.numeric, sum) %>% head"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"### Split-apply-combine"
]
},
{
"cell_type": "code",
"execution_count": 80,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"Species | count |
\n",
"\n",
"\tsetosa | 50 |
\n",
"\tversicolor | 50 |
\n",
"\tvirginica | 50 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|ll}\n",
" Species & count\\\\\n",
"\\hline\n",
"\t setosa & 50 \\\\\n",
"\t versicolor & 50 \\\\\n",
"\t virginica & 50 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"Species | count | \n",
"|---|---|---|\n",
"| setosa | 50 | \n",
"| versicolor | 50 | \n",
"| virginica | 50 | \n",
"\n",
"\n"
],
"text/plain": [
" Species count\n",
"1 setosa 50 \n",
"2 versicolor 50 \n",
"3 virginica 50 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"iris %>% \n",
"group_by(Species) %>% \n",
"summarise(count = n())"
]
},
{
"cell_type": "code",
"execution_count": 82,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"Species | SW.mean | SW.cv |
\n",
"\n",
"\tsetosa | 3.428 | 9.043319 |
\n",
"\tversicolor | 2.770 | 8.827326 |
\n",
"\tvirginica | 2.974 | 9.221802 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lll}\n",
" Species & SW.mean & SW.cv\\\\\n",
"\\hline\n",
"\t setosa & 3.428 & 9.043319 \\\\\n",
"\t versicolor & 2.770 & 8.827326 \\\\\n",
"\t virginica & 2.974 & 9.221802 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"Species | SW.mean | SW.cv | \n",
"|---|---|---|\n",
"| setosa | 3.428 | 9.043319 | \n",
"| versicolor | 2.770 | 8.827326 | \n",
"| virginica | 2.974 | 9.221802 | \n",
"\n",
"\n"
],
"text/plain": [
" Species SW.mean SW.cv \n",
"1 setosa 3.428 9.043319\n",
"2 versicolor 2.770 8.827326\n",
"3 virginica 2.974 9.221802"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"iris %>% \n",
"group_by(Species) %>% \n",
"summarise(SW.mean = mean(Sepal.Width),\n",
" SW.cv = mean(Sepal.Width)/sd(Sepal.Width))"
]
},
{
"cell_type": "code",
"execution_count": 84,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"Species | min | max | mean | median |
\n",
"\n",
"\tsetosa | 4.3 | 5.8 | 5.006 | 5.0 |
\n",
"\tversicolor | 4.9 | 7.0 | 5.936 | 5.9 |
\n",
"\tvirginica | 4.9 | 7.9 | 6.588 | 6.5 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lllll}\n",
" Species & min & max & mean & median\\\\\n",
"\\hline\n",
"\t setosa & 4.3 & 5.8 & 5.006 & 5.0 \\\\\n",
"\t versicolor & 4.9 & 7.0 & 5.936 & 5.9 \\\\\n",
"\t virginica & 4.9 & 7.9 & 6.588 & 6.5 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"Species | min | max | mean | median | \n",
"|---|---|---|\n",
"| setosa | 4.3 | 5.8 | 5.006 | 5.0 | \n",
"| versicolor | 4.9 | 7.0 | 5.936 | 5.9 | \n",
"| virginica | 4.9 | 7.9 | 6.588 | 6.5 | \n",
"\n",
"\n"
],
"text/plain": [
" Species min max mean median\n",
"1 setosa 4.3 5.8 5.006 5.0 \n",
"2 versicolor 4.9 7.0 5.936 5.9 \n",
"3 virginica 4.9 7.9 6.588 6.5 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"iris %>% \n",
"group_by(Species) %>% \n",
"summarise_at(\"Sepal.Length\", funs(min, max, mean, median))"
]
},
{
"cell_type": "code",
"execution_count": 75,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"Species | Sepal.Length_min | Sepal.Width_min | Petal.Length_min | Petal.Width_min | Sepal.Length_max | Sepal.Width_max | Petal.Length_max | Petal.Width_max |
\n",
"\n",
"\tsetosa | 4.3 | 2.3 | 1.0 | 0.1 | 5.8 | 4.4 | 1.9 | 0.6 |
\n",
"\tversicolor | 4.9 | 2.0 | 3.0 | 1.0 | 7.0 | 3.4 | 5.1 | 1.8 |
\n",
"\tvirginica | 4.9 | 2.2 | 4.5 | 1.4 | 7.9 | 3.8 | 6.9 | 2.5 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lllllllll}\n",
" Species & Sepal.Length\\_min & Sepal.Width\\_min & Petal.Length\\_min & Petal.Width\\_min & Sepal.Length\\_max & Sepal.Width\\_max & Petal.Length\\_max & Petal.Width\\_max\\\\\n",
"\\hline\n",
"\t setosa & 4.3 & 2.3 & 1.0 & 0.1 & 5.8 & 4.4 & 1.9 & 0.6 \\\\\n",
"\t versicolor & 4.9 & 2.0 & 3.0 & 1.0 & 7.0 & 3.4 & 5.1 & 1.8 \\\\\n",
"\t virginica & 4.9 & 2.2 & 4.5 & 1.4 & 7.9 & 3.8 & 6.9 & 2.5 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"Species | Sepal.Length_min | Sepal.Width_min | Petal.Length_min | Petal.Width_min | Sepal.Length_max | Sepal.Width_max | Petal.Length_max | Petal.Width_max | \n",
"|---|---|---|\n",
"| setosa | 4.3 | 2.3 | 1.0 | 0.1 | 5.8 | 4.4 | 1.9 | 0.6 | \n",
"| versicolor | 4.9 | 2.0 | 3.0 | 1.0 | 7.0 | 3.4 | 5.1 | 1.8 | \n",
"| virginica | 4.9 | 2.2 | 4.5 | 1.4 | 7.9 | 3.8 | 6.9 | 2.5 | \n",
"\n",
"\n"
],
"text/plain": [
" Species Sepal.Length_min Sepal.Width_min Petal.Length_min Petal.Width_min\n",
"1 setosa 4.3 2.3 1.0 0.1 \n",
"2 versicolor 4.9 2.0 3.0 1.0 \n",
"3 virginica 4.9 2.2 4.5 1.4 \n",
" Sepal.Length_max Sepal.Width_max Petal.Length_max Petal.Width_max\n",
"1 5.8 4.4 1.9 0.6 \n",
"2 7.0 3.4 5.1 1.8 \n",
"3 7.9 3.8 6.9 2.5 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"iris %>% \n",
"group_by(Species) %>% \n",
"summarise_all(funs(min, max))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "R",
"language": "R",
"name": "ir"
},
"language_info": {
"codemirror_mode": "r",
"file_extension": ".r",
"mimetype": "text/x-r-source",
"name": "R",
"pygments_lexer": "r",
"version": "3.4.0"
}
},
"nbformat": 4,
"nbformat_minor": 2
}